Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Prompt-tunning Hierarchical Network for Few-shot Multimodal Relation Extraction

  

  • Published:2025-04-01

提示调优分层网络用于少样本多模态关系抽取

Abstract: Multimodal Relation Extraction (MRE) method assists relation extraction tasks by utilizing multimodal information. To achieve good relation extraction performance, existing MRE models usually need to be effectively trained on a large amount of labeled data. However, they perform poorly in the case of few-shot samples. To address this issue, this paper fully utilizes the rich semantic and prior knowledge in relation labels and proposes a new prompt-tunning hierarchical network. Firstly, design and implement a text prompt module based on knowledge injection. Specifically, by utilizing the entity information hidden in relation labels and introducing virtual entity type words to construct prompt templates, the model can perceive the potential range of entity types in the sample, and continuously optimize the introduced virtual relation answer words using context to express the best semantic information, thereby improving the performance of the model in the case of few-shot samples. Secondly, by utilizing the mutual constraint relation between entity pairs and relations, design and implement an entity-relations collaborative optimization module to further improve the effect of relation extraction. Finally, in each self-attention layer of the text encoder, a visual prefix based attention mechanism is introduced to deeply integrate the layered multi-scale visual features with the text information, thereby generating more effective and robust text representations, significantly reducing the model's sensitivity to errors. The experimental results on the multimodal neural relation extraction dataset (MNRE) show that the precision, recall, and F1 score of the model reach 84.97%, 83.91%, and 84.43%, respectively, which are all improved compared to the benchmark model. Especially in the case of few-shot samples, the proposed model in this paper is significantly better than the benchmark model, demonstrating good relation extraction performance.

摘要: 多模态关系抽取(MRE)方法通过利用多模态信息辅助关系抽取任务。现有MRE模型要达到良好的关系抽取效果,通常需要在大量标注数据的基础上进行有效训练,然而在少样本情况下却表现不佳,针对上述问题,本文充分利用关系标签中丰富的语义知识和先验知识,提出一种新的提示调优分层网络。首先,设计实现基于知识注入的文本提示模块,具体来说,通过利用关系标签中隐含的实体信息,引入虚拟实体类型词构建提示模板,使模型可以感知样本中潜在的实体类型范围,并利用上下文对引入的虚拟关系答案词进行不断优化,以表达最佳的语义信息,从而提升模型在少样本情况下的性能。其次,利用实体对与关系之间的相互制约关系,设计实现实体-关系协同优化模块,以进一步提升关系抽取效果。最后,在文本编码器的每个自注意力层中,通过引入基于视觉前缀的注意力机制,将分层的多尺度视觉特征与文本信息深度融合,从而生成更加有效且鲁棒的文本表示,显著降低模型对错误的敏感性。在多模态神经关系抽取数据集(MNRE)上的实验结果表明,该模型的精确率,召回率,F1值分别达到84.97%、83.91%与84.43%,较基准模型均有一定提升,特别是在少样本情况下,本文所提模型显著优于基准模型,展现出良好的关系抽取效果。